95 research outputs found

    A Note on Linear Time Algorithms for Maximum Error Histograms

    Get PDF
    Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, e.g., V-Optimal, use error measures which are the sums of a suitable function, e.g., square, of the error at each point. Although the best-known algorithms for solving these problems run in quadratic time, a sequence of results have given us a linear time approximation scheme for these algorithms. In recent years, there have been many emerging applications where we are interested in measuring the maximum (absolute or relative) error at a point. We show that this problem is fundamentally different from the other traditional nonl∞ error measures and provide an optimal algorithm that runs in linear time for a small number of buckets. We also present results which work for arbitrary weighted maximum error measures

    Special Section on the International Conference on Data Engineering 2015

    Get PDF
    The papers in this special section were presented at the 31st International Conference on Data Engineering that was held in Seoul, Korea, on April 13-17, 2015. 17, 2015

    Message from the ICDE 2015 Program Committee and general chairs

    Get PDF
    Since its inception in 1984, the IEEE International Conference on Data Engineering (ICDE) has become a premier forum for the exchange and dissemination of data management research results among researchers, users, practitioners, and developers. Continuing this long-standing tradition, the 31st ICDE will be hosted this year in Seoul, South Korea, from April 13 to April 17, 2015. It is our great pleasure to welcome you to ICDE 2015 and to present its proceedings to you

    From Large Language Models to Databases and Back: A discussion on research and education

    Full text link
    This discussion was conducted at a recent panel at the 28th International Conference on Database Systems for Advanced Applications (DASFAA 2023), held April 17-20, 2023 in Tianjin, China. The title of the panel was "What does LLM (ChatGPT) Bring to Data Science Research and Education? Pros and Cons". It was moderated by Lei Chen and Xiaochun Yang. The discussion raised several questions on how large language models (LLMs) and database research and education can help each other and the potential risks of LLMs.Comment: 7 pages, 2 figures, the Panel at the 28th International Conference on Database Systems for Advanced Applications (DASFAA 2023

    Mining Optimized Support Rules for Numeric Attributes

    No full text
    In this paper, we generalize the optimized support association rule problem by permitting rules to contain disjunctions over uninstantiated numeric attributes. For rules containing a single numeric attribute, we present a dynamic programming algorithm for computing optimized association rules. Furthermore, we propose a bucketing technique for reducing the input size, and a divide and conquer strategy that improves the performance significantly without sacrificing optimality. Our experimental results for a single numeric attribute indicate that our bucketing and divide and conquer enhancements are very effective in reducing the execution times and memory requirements of our dynamic programming algorithm. Furthermore, they show that our algorithms scale up almost linearly with the attribute's domain size as well as the number of disjunctions. 1 Introduction Association rules, introduced in [AIS93], provide a useful mechanism for discovering correlations among the underlying data. In it..

    PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

    No full text
    Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. These classifiers first build a decision tree and then prune subtrees from the decision tree in a subsequent pruning phase to improve accuracy and prevent "overfitting". In this paper, we propose PUBLIC, an improved decision tree classifier that integrates the second "pruning" phase with the initial "building" phase. In PUBLIC, a node is not expanded during the building phase, if it is determined that it will be pruned during the subsequent pruning phase. In order to make this determination for a node, before it is expanded, PUBLIC computes a lower bound on the minimum cost subtree rooted at the node. This estimate is then used by PUBLIC to identify the nodes that are certai..
    • …
    corecore